Goto

Collaborating Authors

 Joensuu


k-Means Clustering with Distance-Based Privacy

Neural Information Processing Systems

In this paper, we initiate the study of Euclidean clustering with Distance-based privacy. Distance-based privacy is motivated by the fact that it is often only needed to protect the privacy of exact, rather than approximate, locations.


K-Medoids For K-Means Seeding

James Newling, François Fleuret

Neural Information Processing Systems

We show experimentally that the algorithm clarans of Ng and Han (1994) finds better K -medoids solutions than the V oronoi iteration algorithm of Hastie et al. (2001). This finding, along with the similarity between the V oronoi iteration algorithm and Lloyd's K -means algorithm, motivates us to use clarans as a K -means initializer. We show that clarans outperforms other algorithms on 23/23 datasets with a mean decrease over k-means-++ (Arthur and V assilvitskii, 2007) of 30% for initialization mean squared error (MSE) and 3% for final MSE. We introduce algorithmic improvements to clarans which improve its complexity and runtime, making it a viable initialization scheme for large datasets.





Mitigating Spurious Correlations in Patch-wise Tumor Classification on High-Resolution Multimodal Images

Asaad, Ihab, Shadaydeh, Maha, Denzler, Joachim

arXiv.org Artificial Intelligence

Patch-wise multi-label classification provides an efficient alternative to full pixel-wise segmentation on high-resolution images, particularly when the objective is to determine the presence or absence of target objects within a patch rather than their precise spatial extent. This formulation substantially reduces annotation cost, simplifies training, and allows flexible patch sizing aligned with the desired level of decision granularity. In this work, we focus on a special case, patch-wise binary classification, applied to the detection of a single class of interest (tumor) on high-resolution multimodal nonlinear microscopy images. We show that, although this simplified formulation enables efficient model development, it can introduce spurious correlations between patch composition and labels: tumor patches tend to contain larger tissue regions, whereas non-tumor patches often consist mostly of background with small tissue areas. We further quantify the bias in model predictions caused by this spurious correlation, and propose to use a debiasing strategy to mitigate its effect. Specifically, we apply GERNE, a debiasing method that can be adapted to maximize worst-group accuracy (WGA). Our results show an improvement in WGA by approximately 7% compared to ERM for two different thresholds used to binarize the spurious feature. This enhancement boosts model performance on critical minority cases, such as tumor patches with small tissues and non-tumor patches with large tissues, and underscores the importance of spurious correlation-aware learning in patch-wise classification problems.



gACSON software for automated segmentation and morphology analyses of myelinated axons in 3D electron microscopy

Behanova, Andrea, Abdollahzadeh, Ali, Belevich, Ilya, Jokitalo, Eija, Sierra, Alejandra, Tohka, Jussi

arXiv.org Artificial Intelligence

Background and Objective: Advances in electron microscopy (EM) now allow three-dimensional (3D) imaging of hundreds of micrometers of tissue with nanometer-scale resolution, providing new opportunities to study the ultra-structure of the brain. In this work, we introduce a freely available Matlab-based gACSON software for visualization, segmentation, assessment, and morphology analysis of myelinated axons in 3D-EM volumes of brain tissue samples. Methods: The software is equipped with a graphical user interface (GUI). It automatically segments the intra-axonal space of myelinated axons and their corresponding myelin sheaths and allows manual segmentation, proofreading, and interactive correction of the segmented components. Results: We illustrate the use of the software by segmenting and analyzing myelinated axons in six 3D-EM volumes of rat somatosensory cortex after sham surgery or traumatic brain injury (TBI). Our results suggest that the equivalent diameter of myelinated axons in somatosensory cortex was decreased in TBI animals five months after the injury. Conclusions: Our results indicate that gACSON is a valuable tool for visualization, segmentation, assessment, and morphology analysis of myelinated axons in 3D-EM volumes. Introduction Assessing the structure of the brain is critical to better understanding its normal and abnormal functioning. Advances in electron microscopy (EM) now allow three-dimensional (3D) imaging of hundreds of micrometers of tissue with nanometer-scale resolution, providing new opportunities to study the ultrastructure of the brain [1, 2]. Quantitative analysis of 3D-EM data, such as morphological assessment of ultrastructure, spatial distribution or connectivity of cells, requires the instance segmentation of individual ultrastructural components [3, 4, 5]. Performing this segmentation manually is tedious, if not impossible, due to the large size and enormous number of components in typical 3D-EM data.


Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction

Kumar, Ashwin, Zhang, Hanyu, Schweidel, David A., Yeoh, William

arXiv.org Artificial Intelligence

Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-the-art mobility prediction models trained on a large-scale dataset, highlighting hidden disparities based on user demographics. Drawing from aggregate census data, we compute the difference in predictive performance on racial and ethnic user groups and show a systematic disparity resulting from the underlying dataset, resulting in large differences in accuracy based on location and user groups. To address this, we propose Fairness-Guided Incremental Sampling (FGIS), a group-aware sampling strategy designed for incremental data collection settings. Because individual-level demographic labels are unavailable, we introduce Size-Aware K-Means (SAKM), a clustering method that partitions users in latent mobility space while enforcing census-derived group proportions. This yields proxy racial labels for the four largest groups in the state: Asian, Black, Hispanic, and White. Built on these labels, our sampling algorithm prioritizes users based on expected performance gains and current group representation. This method incrementally constructs training datasets that reduce demographic performance gaps while preserving overall accuracy. Our method reduces total disparity between groups by up to 40\% with minimal accuracy trade-offs, as evaluated on a state-of-art MetaPath2Vec model and a transformer-encoder model. Improvements are most significant in early sampling stages, highlighting the potential for fairness-aware strategies to deliver meaningful gains even in low-resource settings. Our findings expose structural inequities in mobility prediction pipelines and demonstrate how lightweight, data-centric interventions can improve fairness with little added complexity, especially for low-data applications.


Rectifying Shortcut Behaviors in Preference-based Reward Learning

Ye, Wenqian, Zheng, Guangtao, Zhang, Aidong

arXiv.org Artificial Intelligence

In reinforcement learning from human feedback, preference-based reward models play a central role in aligning large language models to human-aligned behavior. However, recent studies show that these models are prone to reward hacking and often fail to generalize well due to over-optimization. They achieve high reward scores by exploiting shortcuts, that is, exploiting spurious features (e.g., response verbosity, agreeable tone, or sycophancy) that correlate with human preference labels in the training data rather than genuinely reflecting the intended objectives. In this paper, instead of probing these issues one at a time, we take a broader view of the reward hacking problem as shortcut behaviors and introduce a principled yet flexible approach to mitigate shortcut behaviors in preference-based reward learning. Inspired by the invariant theory in the kernel perspective, we propose Preference-based Reward Invariance for Shortcut Mitigation (PRISM), which learns group-invariant kernels with feature maps in a closed-form learning objective. Experimental results in several benchmarks show that our method consistently improves the accuracy of the reward model on diverse out-of-distribution tasks and reduces the dependency on shortcuts in downstream policy models, establishing a robust framework for preference-based alignment.